A two-stage feature selection method for text categorization

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two-Stage Feature Selection for Text Classification

In this paper, we focus on feature coverage policies used for feature selection in the text classification domain. Two alternative policies are discussed and compared: corpus-based and class-based selection of features. We make a detailed analysis of pruning and keyword selection by varying the parameters of the policies and obtain the optimal usage patterns. In addition, by combining the optim...

متن کامل

Categorical Proportional Difference: A Feature Selection Method for Text Categorization

Supervised text categorization is a machine learning task where a predefined category label is automatically assigned to a previously unlabelled document based upon characteristics of the words contained in the document. Since the number of unique words in a learning task (i.e., the number of features) can be very large, the efficiency and accuracy of the learning task can be increased by using...

متن کامل

A Knowledge-Based Feature Selection Method for Text Categorization

A major difficulty of text categorization is the high dimensionality of the original feature space. Feature selection plays an important role in text categorization. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization. Many existing experiments show IG is one of th...

متن کامل

A global-ranking local feature selection method for text categorization

In this paper, we propose a filtering method for feature selection called ALOFT (At Least One FeaTure). The proposed method focuses on specific characteristics of text categorization domain. Also, it ensures that every document in the training set is represented by at least one feature and the number of selected features is determined in a data-driven way. We compare the effectiveness of the pr...

متن کامل

Enhancement of DTP Feature Selection Method for Text Categorization

This paper studies the structure of vectors obtained by using term selection methods in high-dimensional text collection. We found that the distance to transition point (DTP) method omits commonly occurring terms, which are poor discriminators between documents, but which convey important information about a collection. Experimental results obtained on the Reuters-21578 collection with the k-NN...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computers & Mathematics with Applications

سال: 2011

ISSN: 0898-1221

DOI: 10.1016/j.camwa.2011.07.045